Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 197905 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 50.6 MiB |
| Average record size in memory | 268.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 3 |
| Boolean | 3 |
tran_timestamp has a high cardinality: 720 distinct values | High cardinality |
orig_acct is highly correlated with is_sar | High correlation |
is_sar is highly correlated with orig_acct | High correlation |
initial_deposit_bene is highly correlated with age_bene | High correlation |
age_bene is highly correlated with initial_deposit_bene | High correlation |
tran_id is uniformly distributed | Uniform |
tran_id has unique values | Unique |
Reproduction
| Analysis started | 2022-09-04 13:13:25.899519 |
|---|---|
| Analysis finished | 2022-09-04 13:13:59.834044 |
| Duration | 33.93 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 197905 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 98953 |
| Minimum | 1 |
|---|---|
| Maximum | 197905 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 9896.2 |
| Q1 | 49477 |
| median | 98953 |
| Q3 | 148429 |
| 95-th percentile | 188009.8 |
| Maximum | 197905 |
| Range | 197904 |
| Interquartile range (IQR) | 98952 |
Descriptive statistics
| Standard deviation | 57130.39685 |
|---|---|
| Coefficient of variation (CV) | 0.5773488105 |
| Kurtosis | -1.2 |
| Mean | 98953 |
| Median Absolute Deviation (MAD) | 49476 |
| Skewness | 0 |
| Sum | 1.958329346 × 1010 |
| Variance | 3263882244 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2049 | 1 | < 0.1% |
| 182924 | 1 | < 0.1% |
| 33477 | 1 | < 0.1% |
| 39622 | 1 | < 0.1% |
| 37575 | 1 | < 0.1% |
| 60104 | 1 | < 0.1% |
| 58057 | 1 | < 0.1% |
| 64202 | 1 | < 0.1% |
| 62155 | 1 | < 0.1% |
| 51916 | 1 | < 0.1% |
| Other values (197895) | 197895 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 197905 | 1 | |
| 197904 | 1 | |
| 197903 | 1 | |
| 197902 | 1 | |
| 197901 | 1 | |
| 197900 | 1 | |
| 197899 | 1 | |
| 197898 | 1 | |
| 197897 | 1 | |
| 197896 | 1 |
| Distinct | 2090 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1909.848776 |
| Minimum | 0 |
|---|---|
| Maximum | 12007 |
| Zeros | 103 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 95 |
| Q1 | 461 |
| median | 2098 |
| Q3 | 2738 |
| 95-th percentile | 4711 |
| Maximum | 12007 |
| Range | 12007 |
| Interquartile range (IQR) | 2277 |
Descriptive statistics
| Standard deviation | 1618.01312 |
|---|---|
| Coefficient of variation (CV) | 0.8471943644 |
| Kurtosis | -0.5802615119 |
| Mean | 1909.848776 |
| Median Absolute Deviation (MAD) | 1545 |
| Skewness | 0.5934536229 |
| Sum | 377968622 |
| Variance | 2617966.456 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2486 | 310 | 0.2% |
| 2584 | 278 | 0.1% |
| 2696 | 207 | 0.1% |
| 2671 | 207 | 0.1% |
| 533 | 207 | 0.1% |
| 2751 | 207 | 0.1% |
| 654 | 206 | 0.1% |
| 392 | 206 | 0.1% |
| 485 | 206 | 0.1% |
| 570 | 206 | 0.1% |
| Other values (2080) | 195665 |
| Value | Count | Frequency (%) |
| 0 | 103 | |
| 1 | 104 | |
| 2 | 103 | |
| 3 | 103 | |
| 4 | 103 | |
| 5 | 103 | |
| 6 | 103 | |
| 7 | 103 | |
| 8 | 111 | |
| 9 | 102 |
| Value | Count | Frequency (%) |
| 12007 | 1 | |
| 11990 | 1 | |
| 11986 | 1 | |
| 11974 | 1 | |
| 11968 | 1 | |
| 11871 | 1 | |
| 11858 | 1 | |
| 11840 | 1 | |
| 11837 | 1 | |
| 11827 | 1 |
bene_acct
Real number (ℝ≥0)
| Distinct | 4077 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 569.6764811 |
| Minimum | 0 |
|---|---|
| Maximum | 11991 |
| Zeros | 21 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 9 |
| Q1 | 24 |
| median | 53 |
| Q3 | 191 |
| 95-th percentile | 4149 |
| Maximum | 11991 |
| Range | 11991 |
| Interquartile range (IQR) | 167 |
Descriptive statistics
| Standard deviation | 1695.983714 |
|---|---|
| Coefficient of variation (CV) | 2.977099757 |
| Kurtosis | 19.95378574 |
| Mean | 569.6764811 |
| Median Absolute Deviation (MAD) | 38 |
| Skewness | 4.364975752 |
| Sum | 112741824 |
| Variance | 2876360.758 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 25 | 3563 | 1.8% |
| 20 | 3263 | 1.6% |
| 13 | 3210 | 1.6% |
| 14 | 3103 | 1.6% |
| 27 | 3017 | 1.5% |
| 17 | 2953 | 1.5% |
| 24 | 2917 | 1.5% |
| 23 | 2917 | 1.5% |
| 18 | 2800 | 1.4% |
| 12 | 2682 | 1.4% |
| Other values (4067) | 167480 |
| Value | Count | Frequency (%) |
| 0 | 21 | < 0.1% |
| 1 | 293 | 0.1% |
| 2 | 420 | 0.2% |
| 3 | 1342 | |
| 4 | 859 | |
| 5 | 1295 | |
| 6 | 1378 | |
| 7 | 1746 | |
| 8 | 1659 | |
| 9 | 1563 |
| Value | Count | Frequency (%) |
| 11991 | 1 | < 0.1% |
| 11990 | 1 | < 0.1% |
| 11974 | 1 | < 0.1% |
| 11876 | 3 | < 0.1% |
| 11871 | 1 | < 0.1% |
| 11858 | 1 | < 0.1% |
| 11840 | 1 | < 0.1% |
| 11822 | 70 | |
| 11746 | 1 | < 0.1% |
| 11702 | 1 | < 0.1% |
base_amt
Real number (ℝ≥0)
| Distinct | 80654 |
|---|---|
| Distinct (%) | 40.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 546.6319096 |
| Minimum | 0.09 |
|---|---|
| Maximum | 999.99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0.09 |
|---|---|
| 5-th percentile | 140.992 |
| Q1 | 319.58 |
| median | 546.78 |
| Q3 | 772.29 |
| 95-th percentile | 954.37 |
| Maximum | 999.99 |
| Range | 999.9 |
| Interquartile range (IQR) | 452.71 |
Descriptive statistics
| Standard deviation | 261.6711649 |
|---|---|
| Coefficient of variation (CV) | 0.4786972006 |
| Kurtosis | -1.197072321 |
| Mean | 546.6319096 |
| Median Absolute Deviation (MAD) | 226.38 |
| Skewness | 0.0007397461412 |
| Sum | 108181188.1 |
| Variance | 68471.79854 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 565.51 | 11 | < 0.1% |
| 247.47 | 10 | < 0.1% |
| 124.69 | 10 | < 0.1% |
| 553.71 | 10 | < 0.1% |
| 988.77 | 10 | < 0.1% |
| 166.96 | 10 | < 0.1% |
| 110.26 | 10 | < 0.1% |
| 855.83 | 10 | < 0.1% |
| 122.84 | 10 | < 0.1% |
| 122.79 | 9 | < 0.1% |
| Other values (80644) | 197805 |
| Value | Count | Frequency (%) |
| 0.09 | 1 | |
| 0.16 | 1 | |
| 0.25 | 2 | |
| 0.43 | 1 | |
| 0.52 | 1 | |
| 0.61 | 1 | |
| 0.91 | 1 | |
| 0.94 | 1 | |
| 1.37 | 1 | |
| 1.5 | 1 |
| Value | Count | Frequency (%) |
| 999.99 | 1 | < 0.1% |
| 999.98 | 3 | |
| 999.97 | 5 | |
| 999.96 | 1 | < 0.1% |
| 999.95 | 4 | |
| 999.94 | 1 | < 0.1% |
| 999.93 | 3 | |
| 999.91 | 1 | < 0.1% |
| 999.9 | 1 | < 0.1% |
| 999.89 | 4 |
| Distinct | 720 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.5 MiB |
| 2017-05-24T00:00:00Z | 363 |
|---|---|
| 2017-04-26T00:00:00Z | 363 |
| 2017-02-15T00:00:00Z | 361 |
| 2017-06-14T00:00:00Z | 359 |
| 2017-05-03T00:00:00Z | 359 |
| Other values (715) |
Length
| Max length | 20 |
|---|---|
| Median length | 20 |
| Mean length | 20 |
| Min length | 20 |
Characters and Unicode
| Total characters | 3958100 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2017-01-01T00:00:00Z |
|---|---|
| 2nd row | 2017-01-01T00:00:00Z |
| 3rd row | 2017-01-01T00:00:00Z |
| 4th row | 2017-01-01T00:00:00Z |
| 5th row | 2017-01-01T00:00:00Z |
Common Values
| Value | Count | Frequency (%) |
| 2017-05-24T00:00:00Z | 363 | 0.2% |
| 2017-04-26T00:00:00Z | 363 | 0.2% |
| 2017-02-15T00:00:00Z | 361 | 0.2% |
| 2017-06-14T00:00:00Z | 359 | 0.2% |
| 2017-05-03T00:00:00Z | 359 | 0.2% |
| 2017-02-08T00:00:00Z | 359 | 0.2% |
| 2017-06-21T00:00:00Z | 359 | 0.2% |
| 2017-02-01T00:00:00Z | 359 | 0.2% |
| 2017-04-05T00:00:00Z | 358 | 0.2% |
| 2017-04-19T00:00:00Z | 358 | 0.2% |
| Other values (710) | 194307 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2017-05-24t00:00:00z | 363 | 0.2% |
| 2017-04-26t00:00:00z | 363 | 0.2% |
| 2017-02-15t00:00:00z | 361 | 0.2% |
| 2017-05-03t00:00:00z | 359 | 0.2% |
| 2017-02-08t00:00:00z | 359 | 0.2% |
| 2017-06-21t00:00:00z | 359 | 0.2% |
| 2017-02-01t00:00:00z | 359 | 0.2% |
| 2017-06-14t00:00:00z | 359 | 0.2% |
| 2017-01-04t00:00:00z | 358 | 0.2% |
| 2017-02-22t00:00:00z | 358 | 0.2% |
| Other values (710) | 194307 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1633970 | |
| - | 395810 | 10.0% |
| : | 395810 | 10.0% |
| 1 | 364403 | 9.2% |
| 2 | 310090 | 7.8% |
| T | 197905 | 5.0% |
| Z | 197905 | 5.0% |
| 7 | 155858 | 3.9% |
| 8 | 114761 | 2.9% |
| 3 | 46883 | 1.2% |
| Other values (4) | 144705 | 3.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2770670 | |
| Dash Punctuation | 395810 | 10.0% |
| Other Punctuation | 395810 | 10.0% |
| Uppercase Letter | 395810 | 10.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1633970 | |
| 1 | 364403 | 13.2% |
| 2 | 310090 | 11.2% |
| 7 | 155858 | 5.6% |
| 8 | 114761 | 4.1% |
| 3 | 46883 | 1.7% |
| 5 | 37164 | 1.3% |
| 4 | 36940 | 1.3% |
| 6 | 36153 | 1.3% |
| 9 | 34448 | 1.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 197905 | |
| Z | 197905 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 395810 |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 395810 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3562290 | |
| Latin | 395810 | 10.0% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1633970 | |
| - | 395810 | 11.1% |
| : | 395810 | 11.1% |
| 1 | 364403 | 10.2% |
| 2 | 310090 | 8.7% |
| 7 | 155858 | 4.4% |
| 8 | 114761 | 3.2% |
| 3 | 46883 | 1.3% |
| 5 | 37164 | 1.0% |
| 4 | 36940 | 1.0% |
| Other values (2) | 70601 | 2.0% |
Latin
| Value | Count | Frequency (%) |
| T | 197905 | |
| Z | 197905 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3958100 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1633970 | |
| - | 395810 | 10.0% |
| : | 395810 | 10.0% |
| 1 | 364403 | 9.2% |
| 2 | 310090 | 7.8% |
| T | 197905 | 5.0% |
| Z | 197905 | 5.0% |
| 7 | 155858 | 3.9% |
| 8 | 114761 | 2.9% |
| 3 | 46883 | 1.2% |
| Other values (4) | 144705 | 3.7% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 193.4 KiB |
| False | |
|---|---|
| True | 671 |
| Value | Count | Frequency (%) |
| False | 197234 | |
| True | 671 | 0.3% |
prior_sar_count_orig
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 193.4 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 174324 | |
| True | 23581 | 11.9% |
initial_deposit_orig
Real number (ℝ≥0)
| Distinct | 2090 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 75597.28015 |
| Minimum | 50009.28 |
|---|---|
| Maximum | 99999.31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 50009.28 |
|---|---|
| 5-th percentile | 53209.65 |
| Q1 | 63514.54 |
| median | 76142.4 |
| Q3 | 87290.86 |
| 95-th percentile | 97289.61 |
| Maximum | 99999.31 |
| Range | 49990.03 |
| Interquartile range (IQR) | 23776.32 |
Descriptive statistics
| Standard deviation | 13970.57272 |
|---|---|
| Coefficient of variation (CV) | 0.1848025841 |
| Kurtosis | -1.157314352 |
| Mean | 75597.28015 |
| Median Absolute Deviation (MAD) | 11870.37 |
| Skewness | -0.04921636702 |
| Sum | 1.496107973 × 1010 |
| Variance | 195176902.1 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 87199.11 | 310 | 0.2% |
| 62288.74 | 278 | 0.1% |
| 90983.71 | 207 | 0.1% |
| 92422.42 | 207 | 0.1% |
| 83668.85 | 207 | 0.1% |
| 77381.5 | 207 | 0.1% |
| 68754.28 | 206 | 0.1% |
| 92961.44 | 206 | 0.1% |
| 97514.84 | 206 | 0.1% |
| 75858.51 | 206 | 0.1% |
| Other values (2080) | 195665 |
| Value | Count | Frequency (%) |
| 50009.28 | 1 | < 0.1% |
| 50050 | 1 | < 0.1% |
| 50058.6 | 95 | |
| 50060.39 | 1 | < 0.1% |
| 50110.37 | 103 | |
| 50255.49 | 94 | |
| 50261.31 | 108 | |
| 50291.36 | 1 | < 0.1% |
| 50295.91 | 92 | |
| 50316.89 | 130 |
| Value | Count | Frequency (%) |
| 99999.31 | 103 | |
| 99951.81 | 103 | |
| 99942.06 | 103 | |
| 99932.76 | 198 | |
| 99928.3 | 197 | |
| 99915.4 | 1 | < 0.1% |
| 99870.09 | 103 | |
| 99857.44 | 103 | |
| 99703.4 | 103 | |
| 99681.7 | 1 | < 0.1% |
gender_orig
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.7 MiB |
| Female | |
|---|---|
| Male |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 5.012607059 |
| Min length | 4 |
Characters and Unicode
| Total characters | 992020 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Female |
|---|---|
| 2nd row | Male |
| 3rd row | Female |
| 4th row | Female |
| 5th row | Female |
Common Values
| Value | Count | Frequency (%) |
| Female | 100200 | |
| Male | 97705 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| female | 100200 | |
| male | 97705 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 298105 | |
| a | 197905 | |
| l | 197905 | |
| F | 100200 | 10.1% |
| m | 100200 | 10.1% |
| M | 97705 | 9.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 794115 | |
| Uppercase Letter | 197905 | 19.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 298105 | |
| a | 197905 | |
| l | 197905 | |
| m | 100200 | 12.6% |
Uppercase Letter
| Value | Count | Frequency (%) |
| F | 100200 | |
| M | 97705 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 992020 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 298105 | |
| a | 197905 | |
| l | 197905 | |
| F | 100200 | 10.1% |
| m | 100200 | 10.1% |
| M | 97705 | 9.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 992020 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 298105 | |
| a | 197905 | |
| l | 197905 | |
| F | 100200 | 10.1% |
| m | 100200 | 10.1% |
| M | 97705 | 9.8% |
prior_sar_count_bene
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 193.4 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 162053 | |
| True | 35852 | 18.1% |
| Distinct | 4077 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 74310.10622 |
| Minimum | 50001.55 |
|---|---|
| Maximum | 99999.31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 50001.55 |
|---|---|
| 5-th percentile | 52405.77 |
| Q1 | 62535.57 |
| median | 71210.53 |
| Q3 | 88048.5 |
| 95-th percentile | 97896.53 |
| Maximum | 99999.31 |
| Range | 49997.76 |
| Interquartile range (IQR) | 25512.93 |
Descriptive statistics
| Standard deviation | 14760.15747 |
|---|---|
| Coefficient of variation (CV) | 0.198629207 |
| Kurtosis | -1.211244726 |
| Mean | 74310.10622 |
| Median Absolute Deviation (MAD) | 13116.03 |
| Skewness | 0.1628954081 |
| Sum | 1.470634157 × 1010 |
| Variance | 217862248.5 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 67858.54 | 3563 | 1.8% |
| 51562.38 | 3263 | 1.6% |
| 64631.3 | 3210 | 1.6% |
| 97896.53 | 3103 | 1.6% |
| 56983.45 | 3017 | 1.5% |
| 62535.57 | 2953 | 1.5% |
| 89199.14 | 2917 | 1.5% |
| 91434.67 | 2917 | 1.5% |
| 62618.6 | 2800 | 1.4% |
| 52909.53 | 2682 | 1.4% |
| Other values (4067) | 167480 |
| Value | Count | Frequency (%) |
| 50001.55 | 2 | < 0.1% |
| 50003.89 | 5 | < 0.1% |
| 50020.25 | 5 | < 0.1% |
| 50028.14 | 5 | < 0.1% |
| 50034.04 | 1 | < 0.1% |
| 50050 | 2 | < 0.1% |
| 50057.7 | 8 | |
| 50058.6 | 1 | < 0.1% |
| 50066.61 | 2 | < 0.1% |
| 50070.54 | 17 |
| Value | Count | Frequency (%) |
| 99999.31 | 7 | < 0.1% |
| 99994.47 | 2 | < 0.1% |
| 99984.68 | 10 | < 0.1% |
| 99961.38 | 2 | < 0.1% |
| 99951.81 | 4 | < 0.1% |
| 99932.76 | 14 | < 0.1% |
| 99928.3 | 11 | < 0.1% |
| 99918.74 | 107 | |
| 99915.4 | 1 | < 0.1% |
| 99868.47 | 168 |
gender_bene
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.7 MiB |
| Female | |
|---|---|
| Male |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 5.014072408 |
| Min length | 4 |
Characters and Unicode
| Total characters | 992310 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Female |
|---|---|
| 2nd row | Female |
| 3rd row | Male |
| 4th row | Female |
| 5th row | Female |
Common Values
| Value | Count | Frequency (%) |
| Female | 100345 | |
| Male | 97560 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| female | 100345 | |
| male | 97560 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 298250 | |
| a | 197905 | |
| l | 197905 | |
| F | 100345 | 10.1% |
| m | 100345 | 10.1% |
| M | 97560 | 9.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 794405 | |
| Uppercase Letter | 197905 | 19.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 298250 | |
| a | 197905 | |
| l | 197905 | |
| m | 100345 | 12.6% |
Uppercase Letter
| Value | Count | Frequency (%) |
| F | 100345 | |
| M | 97560 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 992310 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 298250 | |
| a | 197905 | |
| l | 197905 | |
| F | 100345 | 10.1% |
| m | 100345 | 10.1% |
| M | 97560 | 9.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 992310 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 298250 | |
| a | 197905 | |
| l | 197905 | |
| F | 100345 | 10.1% |
| m | 100345 | 10.1% |
| M | 97560 | 9.8% |
age_orig
Real number (ℝ≥0)
| Distinct | 117 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 59.89496476 |
| Minimum | 0 |
|---|---|
| Maximum | 116 |
| Zeros | 801 |
| Zeros (%) | 0.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 7 |
| Q1 | 31 |
| median | 62 |
| Q3 | 90 |
| 95-th percentile | 110 |
| Maximum | 116 |
| Range | 116 |
| Interquartile range (IQR) | 59 |
Descriptive statistics
| Standard deviation | 33.73379436 |
|---|---|
| Coefficient of variation (CV) | 0.5632158646 |
| Kurtosis | -1.221810746 |
| Mean | 59.89496476 |
| Median Absolute Deviation (MAD) | 30 |
| Skewness | -0.07009563801 |
| Sum | 11853513 |
| Variance | 1137.968882 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 114 | 2743 | 1.4% |
| 109 | 2591 | 1.3% |
| 52 | 2575 | 1.3% |
| 82 | 2539 | 1.3% |
| 73 | 2529 | 1.3% |
| 86 | 2495 | 1.3% |
| 12 | 2493 | 1.3% |
| 10 | 2478 | 1.3% |
| 64 | 2420 | 1.2% |
| 69 | 2359 | 1.2% |
| Other values (107) | 172683 |
| Value | Count | Frequency (%) |
| 0 | 801 | 0.4% |
| 1 | 2232 | |
| 2 | 1474 | |
| 3 | 961 | |
| 4 | 952 | |
| 5 | 1584 | |
| 6 | 1384 | |
| 7 | 1885 | |
| 8 | 1141 | |
| 9 | 1729 |
| Value | Count | Frequency (%) |
| 116 | 1018 | 0.5% |
| 115 | 1096 | 0.6% |
| 114 | 2743 | |
| 113 | 1248 | |
| 112 | 1740 | |
| 111 | 2016 | |
| 110 | 1944 | |
| 109 | 2591 | |
| 108 | 1653 | |
| 107 | 2226 |
| Distinct | 117 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 58.44603724 |
| Minimum | 0 |
|---|---|
| Maximum | 116 |
| Zeros | 632 |
| Zeros (%) | 0.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 7 |
| Q1 | 31 |
| median | 61 |
| Q3 | 83 |
| 95-th percentile | 108 |
| Maximum | 116 |
| Range | 116 |
| Interquartile range (IQR) | 52 |
Descriptive statistics
| Standard deviation | 31.93012218 |
|---|---|
| Coefficient of variation (CV) | 0.5463180001 |
| Kurtosis | -1.125626711 |
| Mean | 58.44603724 |
| Median Absolute Deviation (MAD) | 25 |
| Skewness | -0.0850688605 |
| Sum | 11566763 |
| Variance | 1019.532702 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 23 | 6248 | 3.2% |
| 78 | 5746 | 2.9% |
| 54 | 5559 | 2.8% |
| 51 | 5535 | 2.8% |
| 74 | 4400 | 2.2% |
| 49 | 4396 | 2.2% |
| 80 | 4395 | 2.2% |
| 73 | 4285 | 2.2% |
| 64 | 4132 | 2.1% |
| 75 | 3964 | 2.0% |
| Other values (107) | 149245 |
| Value | Count | Frequency (%) |
| 0 | 632 | 0.3% |
| 1 | 750 | 0.4% |
| 2 | 2786 | |
| 3 | 553 | 0.3% |
| 4 | 1429 | |
| 5 | 2441 | |
| 6 | 836 | 0.4% |
| 7 | 1415 | |
| 8 | 2493 | |
| 9 | 347 | 0.2% |
| Value | Count | Frequency (%) |
| 116 | 430 | 0.2% |
| 115 | 788 | 0.4% |
| 114 | 497 | 0.3% |
| 113 | 415 | 0.2% |
| 112 | 410 | 0.2% |
| 111 | 2302 | |
| 110 | 782 | 0.4% |
| 109 | 2445 | |
| 108 | 3026 | |
| 107 | 679 | 0.3% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| tran_id | orig_acct | bene_acct | base_amt | tran_timestamp | is_sar | prior_sar_count_orig | initial_deposit_orig | gender_orig | prior_sar_count_bene | initial_deposit_bene | gender_bene | age_orig | age_bene | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 4376 | 170 | 885.30 | 2017-01-01T00:00:00Z | False | False | 63446.28 | Female | False | 84168.61 | Female | 106 | 43 |
| 1 | 2 | 4300 | 23 | 630.41 | 2017-01-01T00:00:00Z | False | False | 79684.15 | Male | False | 89199.14 | Female | 58 | 41 |
| 2 | 3 | 4433 | 12 | 393.14 | 2017-01-01T00:00:00Z | False | False | 64630.28 | Female | False | 52909.53 | Male | 114 | 51 |
| 3 | 4 | 2552 | 6503 | 659.74 | 2017-01-01T00:00:00Z | False | False | 79188.34 | Female | False | 57537.45 | Female | 35 | 27 |
| 4 | 5 | 2552 | 6503 | 442.44 | 2017-01-01T00:00:00Z | False | False | 79188.34 | Female | False | 57537.45 | Female | 35 | 27 |
| 5 | 6 | 281 | 75 | 140.06 | 2017-01-01T00:00:00Z | False | True | 87074.80 | Female | False | 50335.69 | Male | 78 | 53 |
| 6 | 7 | 553 | 400 | 612.57 | 2017-01-01T00:00:00Z | False | False | 76295.34 | Female | False | 85853.72 | Female | 108 | 24 |
| 7 | 8 | 240 | 990 | 665.62 | 2017-01-01T00:00:00Z | False | False | 76874.34 | Male | False | 70044.46 | Male | 88 | 60 |
| 8 | 9 | 668 | 132 | 970.50 | 2017-01-01T00:00:00Z | False | False | 60794.76 | Female | False | 60784.17 | Male | 18 | 14 |
| 9 | 10 | 2085 | 104 | 945.46 | 2017-01-01T00:00:00Z | False | False | 88190.98 | Male | True | 67881.53 | Male | 62 | 78 |
Last rows
| tran_id | orig_acct | bene_acct | base_amt | tran_timestamp | is_sar | prior_sar_count_orig | initial_deposit_orig | gender_orig | prior_sar_count_bene | initial_deposit_bene | gender_bene | age_orig | age_bene | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 197895 | 197896 | 443 | 64 | 475.66 | 2018-12-21T00:00:00Z | False | False | 53563.64 | Female | False | 91528.28 | Male | 15 | 60 |
| 197896 | 197897 | 4651 | 6029 | 632.00 | 2018-12-21T00:00:00Z | False | False | 74115.62 | Male | False | 95516.71 | Male | 1 | 37 |
| 197897 | 197898 | 265 | 14 | 188.37 | 2018-12-21T00:00:00Z | False | False | 50781.53 | Male | False | 97896.53 | Female | 115 | 54 |
| 197898 | 197899 | 105 | 13 | 116.02 | 2018-12-21T00:00:00Z | False | True | 82521.54 | Female | True | 64631.30 | Male | 78 | 49 |
| 197899 | 197900 | 675 | 270 | 502.13 | 2018-12-21T00:00:00Z | False | False | 70050.90 | Female | True | 64211.03 | Female | 115 | 57 |
| 197900 | 197901 | 2948 | 51 | 675.30 | 2018-12-21T00:00:00Z | False | False | 73456.24 | Male | False | 71122.41 | Male | 49 | 11 |
| 197901 | 197902 | 276 | 7688 | 864.55 | 2018-12-21T00:00:00Z | False | False | 58993.69 | Female | False | 64648.90 | Male | 115 | 54 |
| 197902 | 197903 | 885 | 198 | 682.87 | 2018-12-21T00:00:00Z | False | False | 51234.09 | Female | True | 85152.71 | Female | 78 | 114 |
| 197903 | 197904 | 4278 | 639 | 780.68 | 2018-12-21T00:00:00Z | False | False | 66156.31 | Male | False | 96278.77 | Male | 25 | 10 |
| 197904 | 197905 | 501 | 42 | 459.01 | 2018-12-21T00:00:00Z | False | False | 88951.73 | Female | False | 70573.82 | Male | 103 | 111 |